High quality genome assembly of the anhydrobiotic midge provides insights on a single chromosome-based emergence of extreme desiccation tolerance

Abstract Non-biting midges (Chironomidae) are known to inhabit a wide range of environments, and certain species can tolerate extreme conditions, where the rest of insects cannot survive. In particular, the sleeping chironomid Polypedilum vanderplanki is known for the remarkable ability of its larvae to withstand almost complete desiccation by entering a state called anhydrobiosis. Chromosome numbers in chironomids are higher than in other dipterans and this extra genomic resource might facilitate rapid adaptation to novel environments. We used improved sequencing strategies to assemble a chromosome-level genome sequence for P. vanderplanki for deep comparative analysis of genomic location of genes associated with desiccation tolerance. Using whole genome-based cross-species and intra-species analysis, we provide evidence for the unique functional specialization of Chromosome 4 through extensive acquisition of novel genes. In contrast to other insect genomes, in the sleeping chironomid a uniquely high degree of subfunctionalization in paralogous anhydrobiosis genes occurs in this chromosome, as well as pseudogenization in a highly duplicated gene family. Our findings suggest that the Chromosome 4 in Polypedilum is a site of high genetic turnover, allowing it to act as a ‘sandbox’ for evolutionary experiments, thus facilitating the rapid adaptation of midges to harsh environments.


INTRODUCTION
Chironomids (non-biting midges) are one of the most widely distributed groups of insects and include cosmopolitan species and others living in niche adverse environments, where they are subjected to extremes of pH, salinity, or temperature, as well as desiccation (1,2). As these adaptation capabilities are lineage specific rather than conserved widely throughout chironomids, the extent of this rapid acquisition of novel tolerance capabilities suggests that there may be a fundamental mechanism that supports such evolution.
One of the most striking examples of adaptation of chironomids to extreme environment is the 'sleeping chironomid' Polypedilum vanderplanki (3). Upon desiccation, the larvae enter an ametabolic reversible dry state--anhydrobiosis. Loss of water causes devastating damage to living cells, and therefore anhydrobiotes have acquired various protective mechanisms to withstand such damage (4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16). We have previously showed that anhydrobiosis is mediated by evolution of unique protective gene groups highly specific for this insect, tandemly duplicated forming multiple gene clusters within the genome termed Anhydrobiosis Related Islands (ARIds) (11). These regions contain genes encoding intrinsically disordered protein LEAs, antioxidant GSTs, globins, etc., many of which have been implicated to anhydrobiosis. In comparison, the genome of the non-anhydrobiotic relative P. nubifer has no similar regions, thus suggesting that the genomic evolutions underlying anhydrobiosis acquisition is extremely rapid (∼50MYA). Together, we hypothesized the existence of a genomic feature that enabled such adaptations.
Genomic evolution in Diptera has been extensively studied using the Drosophila complex (17,18). Additional resources has also implied genome variance through expansion/contraction of genome size and constant karyotypes (19). Chironomids in general have smaller genome size (100-200 Mb) compared to the average of dipterans and insects (19,20). On the contrary, most chironomid species have additional chromosomes (chironomids: 2n = 6-16, mosquitoes: 2n = 6, flies : 2n = 6-8) (21,22). Several lineages even show large diversity in chromosome numbers within a single genus (22). These observations may imply a potential mechanism where chironomids genomes may have evolved through chromosomal rearrangements rather than genome enlargement and have utilized these additional genome resources for the environmental adaptations discussed above.
P. vanderplanki has four chromosomes (2n = 8) (23), of which Chromosome 4 has the highest karyotypic diversity and genetic distance in comparison with the recently described anhydrobiotic close relative Polypedilum pembai (23)(24)(25). This chromosome differs from the 'dot chromosome' found in Drosophila, as they are highly repetitive, heterochromatic and harbors very few genes. Chromosome 4 is half in length of the other three chromosomes and has multiple active transcriptional regions (Balbiani rings), thus would consist of much more active genes (23). These ob-servations imply Chromosome 4 as a starting point to determine the basis of rapid genomic evolution that enabled anhydrobiosis in P. vanderplanki. A key factor that would be a candidate for such genomic adaptations would be the ARId loci; how are these loci spaced within the chromosome structures? What facilitated the emergence of such loci? The answers to these questions may address genome adaptations against extreme environmental stress in not only Polypedilum, but in chironomids, possibly insects as well. A chromosomal level genome assembly would be required to conduct such comprehensive inter-chromosome analysis; however, our previous genome assembly of P. vanderplanki was fragmented, hindering such analysis.
To this end, by using a combination of long-read sequencing technologies and Hi-C scaffolding, we assembled a chromosome-level genome of P. vanderplanki. We also applied Cap-trap RNA-Seq (CTR-Seq) to concurrently identify gene borders and promoter regions (26), which would provide rich information to following studies aimed to validate anhydrobiosis machinery identified by genomic analysis. We found that anhydrobiosis-associated genes are localized preferably on Chromosome 4, including seven out of the nine ARId loci. Using comprehensive inter-species analysis, we confirmed that most of the genes localized in this chromosome are highly specific for P. vanderplanki, and do not have orthologs in other Diptera. Combined with the evidence of pseudogenization in a highly duplicated gene family and unexpectedly high mutation load compared to other chromosomes, our findings suggest that the Chromosome 4 in chironomids is a unique site of high genetic turnover, allowing it to act as a 'sandbox' for evolutionary experiments, thus facilitating the rapid adaptation of midges to harsh environments.

MATERIALS AND METHODS
Details can be found in SI Appendix. . Hi-C libraries were processed according to the general protocol as described previously (27) and library constructed with NEB-Next Ultra II DNA Library Prep Kit for Illumina (New England Biolabs) and sequenced on a HiSeq 2500 system (Illumina, 100 bp, paired end). Library lengths were assayed with Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA). In addition, P. vanderplanki larvae and Pv11 cells were exposed to a variety of stress inducers (larvae: UVC 100 mJ/cm, Pv11 cells: 20.5% (v/w) NaCl, 600 mM mannitol, 600 mM trehalose, 20 M paraquat and 42 • C heat shock, anhydrobiosis cycle). Exposed larvae/cells were sampled at designated time points and total RNA was extracted with RNAiso Plus (Takara Bio, Kyoto, Japan). The sequencing library was constructed with TruSeq RNA Sample Prep Kit -v8 and was submitted to sequencing with HiSeq1500. Finally, total RNA for insect and Pv11 undergoing anhydrobiosis were also submitted to CTR-Seq library construction following a previous study (26).

Genome assembly and gene prediction
We employed a meta-assembly to obtain a highly contiguous and complete genome assembly. Long read based genome assemblies were produced with HGAP4 and DBG2OLC (28)(29)(30), which were used for meta-assembly of an Illumina-based Platanus assembly (31). This assembly was scaffolded with Hi-C reads, resulting in the final assembly. The initial gene prediction was produced with the Braker pipeline using all RNA-Seq data stated above (32). Furthermore, Nanopore-cDNA-Seq data produced by CTR-Seq was used to predict the final Pv 5.2.4 gene set using Talon (33). This gene set was validated for completeness with BUSCO (34). The genome was submitted to variant calling with the GATK pipeline using our previous DNA-Seq data of wild P. vanderplanki populations, inbred NIAS01 and Pv11 (11,24,35), and Sniffles for PacBio data (36). Statistics for variants comparison were calculated with PoPoolation2, snpEff and bcftools (37)(38)(39). General features of the genome (i.e. GC ratio, Gene and repeat content, DNA-Seq and RNA-Seq coverage) were calculated in 50 or 100 kb windows with BEDtools v2.29.2-39-g5210e6f (40) or in-house Perl scripts using G-language GAE (41).

Comparative genomics
Dipteran genomes and transcriptome assemblies were obtained from ENSEMBL Metazoa, NCBI and other data servers. For genomes without gene predictions, the Augustus gene model created during a BUSCO run was used for ab initio gene prediction (42). Transcriptome assemblies with low BUSCO scores were submitted to re-assembly with Trinity (43) to obtain higher quality transcriptome assemblies. The longest isoforms for each gene were submitted to OrthoFinder clustering (44). The amino acid sequences for each genome were pooled and bidirectional hits were determined with Diamond blastp v0.9.24 (45) and the number of genes conserved between species were calculated with in-house Perl scripts. In addition, 1-to-1 ortholog were determined for Drosophila melanogaster to identify D. melanogaster essential gene orthologs. Each chromosome was tested for enrichment or depletion of Dm-essential gene orthologs or BUSCO genes. Colinear blocks were detected with McScanX and visualized with synvisio (46,47). dN/dS ratio between P. vanderplanki and P. pembai orthologs were calculated with codeml in the PAML package (48) based on MAFFT alignments (49,50). Gene ontology enrichment analysis for several gene groups were performed with GOStats v2.52.0 (51).

Transcriptome analysis
Gene expression was quantified using RSEM in the Trinity package and were submitted to differential expression analysis with DESeq2 and edgeR (52)(53)(54). Genes with FDR < 0.05 and FC > 2 was designated as differentially expressed genes. Gene expression profiles were clustered based on the Spearman correlation, and genes in each cluster were submitted to gene ontology enrichment analysis using GO-Stat and visualized with Revigo (51,55).

A chromosomal-scale genome assembly for Polypedilum vanderplanki
We assembled a chromosome-level genome for P. vanderplanki--the first chromosome-level genome in any anhydrobiotic animals--utilizing Illumina, PacBio and Hi-C sequencing data from P. vanderplanki-derived Pv11 cell line and larvae (Additional Data S1, S2 and S3, Additional Text S1). The 98.8% of the resulting 118.9 Mb assembly is arranged in the four largest scaffolds (Table  1), corresponding to its four chromosomes ( Figure 1A). High BUSCO scores were observed (Diptera lineage, Complete 95.7%, Missing 3.6%, Additional Text S1) and repeat sequences detection combined with Hi-C contact mapping suggested the existence of putative centromeric and telomeric regions, respectively ( Figure 1B, Additional Data S4, Additional Text S1). Comparison of Pv5.2 with our previous genome assembly Pv0.9 did not show extensive genomic changes that would be anticipated from sequencing the Pv11 cell line (Additional Text S2, Additional Figure S2).
We combined all available RNA-Seq data and full-length cDNA Nanopore sequencing from Cap Trap RNA-Seq  Text S4). There was no significant enrichment of horizontally transferred genes in any specific chromosomes (Additional Text S4).
These statistics indicate that this genome assembly has greatly improved contiguity, completeness and quality compared to our previous genome resources, and is compatible for comparison with other Diptera genomes.

General analysis revealed unusual features of Chromosome 4
Upon gene model annotation and general analysis, we noticed that there are unexpected, but clear inequalities among chromosomes in terms of several key characteristics. The majority of previously identified anhydrobiosis genes (i.e. Lea and Lil genes) are located on Chromosome 4; seven out of the nine ARIds were located on Chromosome 4 ( Figure  1A, Additional Data S7). Intra-chromosomal synteny regions were increased on Chromosome 4 than other chromosomes ( Figure 1A). Several transposable elements (TEs) in were enriched in Chromosome 4 (Additional Data S8). Chromosome 4 has lower average chromosome-wide GC ratio ( Figure 1C) and at gene level ( Figure 1D, Tukey HSD, FDR < 0.001). These features implied Chromosome 4 may harbor other features that may have provided the basis of genomic evolution in P. vanderplanki.
Hypothesizing that this decrease in GC nucleotides may be related to mutation accumulation, we profiled the genome for variants with our previous Pool-Seq data of wild P. vanderplanki populations (Additional Data S9, Additional Figure S4). We observed increased pairwise nucleotide diversity in the 3-12 Mb region of Chromosome 4 ( Figure 1E, Additional text S5), implying relaxed selective pressure in these regions. On the contrary, lower Ts/Tv ra-tio in Chromosome 4 were observed (Additional Data S9, Additional Figure S4d, Tukey HSD, FDR < 0.01), suggesting a biased substitution profile compared to other chromosomes.
Together, Chromosome 4 has a highly biased nucleotide composition and accumulation of variants compared to other chromosomes, suggesting an increase in the genetic diversity of Chromosome 4, possibly caused by relaxed selective pressure. Thus we hypothesized that Chromosome 4 may specifically harbor traces of genomic adaptations that enabled anhydrobiosis in this species.

Chromosome 4 lack synteny blocks conserved in other Diptera but is not of completely novel origin
We then proceeded to compare the genomic content of P. vanderplanki with that of publicly available dipterans and chironomids to identify genetic content specific to P. vanderplanki, and more interestingly, Chromosome 4 (Additional Data S10, Additional Text S6 and S7).
As we have hypothesized that the decrease in GC ratio may be related to the genomic evolution occurring in P. vanderplanki, we first assayed the GC ratio within Diptera. We observed low GC ratio in multiple chironomids compared to other Diptera (Figure 2A). Genomes with lower GC ratio showed lower GC ratio at the coding sequences level as well, suggesting a genome-wide adaptation. Interestingly, the desiccation-sensitive P. nubifer and the Antarctic midge Belgica antarctica did not show such reductions (11,57), inferring a dynamic accumulation of G/C nucleotides after their divergence with other lineages.
We then analyzed the gene conservation ratio, e.g. number of genes in a chromosome that a homolog could be detected in other dipterans. Genes on Chromosome 4 (3 241 genes) showed lower conservation (30-40%) with chironomids outside of the subfamily Chironominae, compared to those on other chromosomes (55-70%, Figure 2A). Using other major dipteran chromosome-level genomes (Aedes aegypti, Anopheles gambiae, D. melanogaster) did not show such decreases in conservation ratios in a single autosome, with the exception of the Y chromosome in D. melanogaster. Using draft genome and transcriptome assemblies of chi- ronomids (Additional Text S6, Additional Data S11) also indicated similar conservation profiles (Additional Figure  S5a, Additional Text S7). Besides, by using ab initio gene predictions for all genomes, we verified that using different gene prediction methods did not affect our results (Additional Text S7, Additional Figure S5b). These data suggests that Chromosome 4 may harbor a large number of species (or genus) specific genes. These data led us to consider whether Chromosome 4 genes were evolutionary conserved from a common Diptera ancestor. Thus, we assayed whether synteny blocks would be detected between P. vanderplanki and other dipterans. Collinear blocks were detected between P. vanderplanki and P. pembai in all chromosomes ( Figure 2B), but all other species studied lacked such blocks on Chromosome 4 (Culicomorpha and Chironomidae, Figure 2C, Additional Figure S5c). Results varied in close relatives (Chironomini tribe, e.g. P. nubifer, Chironomus tentans, Chironomus riparius, Additional Figure S3def). The lack of synteny blocks between P. vanderplanki and other chironomids may be due to the fragmentation of the genome assembly (i.e. P. nubifer has 9672 scaffolds, N50 length 26kbp), preventing correct synteny analysis. Assemblies with higher contiguity would be required for more comprehensive analysis.
This observation led us to question where the genes on Chromosome 4 genes originated from. By determining or-   Figure 1A.
thologs by bidirectional best hits, we observed that the genes on Chromosome 4 have orthologs across all chromosomes in D. melanogaster, A. aegypti, and A. gambiae (Additional Figure 5g, Additional Text S7), indicating that this chromosome was not obtained through horizontal transfer. This also suggested that inter-chromosomal translocations have occurred at some time point during the evolutionary trajectory of chironomids, creating a mosaic conservation pattern. Intriguingly, we observed that Chromosome 4 was significantly depleted of BUSCO genes (Diptera lineage) or Drosophila essential genes (Additional Data S13). Furthermore, higher dN/dS values were observed between 1-to-1 P. vanderplanki and P. pembai Chromosome 4 orthologs (Additional Figure S5h), suggesting relaxed selective pressure on Chromosome 4 genes. Together, these observations suggest that Chromosome 4 is not of novel origin and shares genes with all chromosomes in other Mosquitoes. While we observed relaxed selection on Chromosome 4 genes, we also found contradicting data suggesting the existence of a selective pressure to remove essential or highly conserved genes from Chromosome 4.

Novel genes on Chromosome 4 are enriched in ionotropic receptor pseudogenes
We then turned to novel genes on Chromosome 4; are there any functions that are enriched in genes located on this highly variable chromosome? Approximately 40% of the P. vanderplanki-or Polypedilum-specific genes mapped to Chromosome 4 ( Figure 3A, Additional Data S14, Additional Text S8), suggesting that this chromosome might function as a scaffold for acquisition of novel genes. We found 327 ortholog clusters, corresponding to 653 genes, with orthologs in all Polypedilum species but not in other dipterans or outgroups, of which 137 genes had a Swiss-Prot homolog (BLASTP, 1e-15). Gene ontology enrichment analysis of these genes based on InterPro annotations indicated enrichment of chemosensory functions and ion transportation ( Figure 3B, Additional Data S15). We also observed that Chromosome 4 harbored genes that were not expressed in any of the transcriptome samples sequenced in this study--at nearly twice the rate of other chromosomes (Table 3)--suggesting a higher number of pseudogenes on Chromosome 4.
We then filtered the ortholog clusters (Clustering 1, Additional Text S8) for Polypedilum-specific gene families. Intriguingly, eight out of the largest 15 clusters were transmembrane proteins, including the LIL protein family. Proteins in four clusters were annotated with the ionotropic receptor domain 11a or 20A, but BLASTP search against Swiss-Prot did not show homology to previously known ionotropic receptors (E-value < 1e-15 threshold). The largest ortholog cluster (OG0000168) was extensively duplicated in P. vanderplanki (117 copies, P. pembai three copies, P. nubifer one copy, Additional Data S16) with 45 copies on Chromosome 4 (significantly enriched, Fisher's exact test, p-value = 1.38 × 10 -7 ). Most orthologs (115/117 copies in OG0000168), were not expressed (average TPM < 1); we hypothesize that these are pseudogenes. High number copies of non-expressed genes also implied transposon activity; however, only one gene was predicted to have a Dfam domain (DR0206503, E-value = 8.5e-05), thus showing this protein family was not obtained from transposon activity. Calculating dN/dS values between all P. pembai and P. vanderplanki orthologs indicated an average of 0.51-0.55 (median 0.48-0.52, Figure 3C), considerably higher than dN/dS values of P. pembai and P. vanderplanki 1-to-1 orthologs (Additional Figure 5h) or gene families of similar size (24). These data support that this gene family may be under relaxed selection, due to pseudogenization or its multicopy nature.

Subfunctionalization of Lea orthologs on Chromosome 4 support anhydrobiosis
Finally, we concentrated on anhydrobiosis genes on Chromosome 4. All ARId loci were present in the genome, with seven out of the nine regions being placed on Chromosome 4 (Additional Data S7). Most of these regions are located at the latter half of Chromosome 4, which has higher nucleotide diversity and lower GC ratio. This observation emphasized the importance of the features of Chromosome 4 indicated above, as it may possibly shape the basis of the rapid environmental adaptation in various lineages.
ARIds are loci enriched in genes upregulated during anhydrobiosis entry. For example, the largest region, ARId 1, harbors 22 LEA and 14 LIL protein coding genes. Previous studies have identified paralog specific cellular localization, protein characteristics, and gene expression profiles of these paralogs (12,16). We extended this analysis by conducting transcriptome analysis of Pv11 cells subjected to various stresses (Additional Figure S6, Additional Data S17 and S18) and identified differentially expressed transcripts with stringent thresholds. Specific discussion on the analysis of transcriptome profiles can be found in the Additional text S9.
As previous studies have suggested, various genes in the ARId regions were differentially expressed, supporting our previous findings. We focused on the ARId1 locus, as it harbors the most anhydrobiosis related genes duplicated in tandem and on the same strand. Phylogenetic analysis has indicated many of the closely located Lea paralogs originate from a single ancestor, making it an ideal circumstance for generating subfunctionalization between paralogs. The Pv5.2.4 gene set contains 2 178 transcripts originating from ARId1 (corresponding to 50 genes). We detected 496 transcripts with an average TPM over 1, of which approximately 281 transcripts (42 genes) were differentially expressed in our time course transcriptome data. Grouping consecutive genes by their regulation profiles (e.g. regulated during either Mannitol/NaCl exposure or Trehalose preconditioning, or both) allowed us to divide them into four 'Blocks' (Figure 3D, Additional Text S10). We observed that the Blocks expressed widely across datasets had significantly higher isoelectric point (pI) values (average 6.87) of deduced proteins than the two other Lea Blocks (average 5.18 and 4.74; Tukey HSD, Figure 3E). The regulation groups were characterized by the presence of heat shock elements (HSEs): similarly, four out of five orthologs in Block 4 had HSEs in their upstream sequences. We have previously shown the importance of gene regulation by Heat Shock Factor (HSF) in Pv11 cells, thus identification of HSEs in   Block 4 Lea paralogs would support their contribution to anhydrobiosis. These findings suggests that paralogous anhydrobiosis loci, at least Lea genes in this case, may have undergone subfunctionalization to accommodate dynamic changes in cellular physiology that may occur during anhydrobiosis.

DISCUSSION
This study illustrates a single chromosome playing an important role in the rapid adaptation in P. vanderplanki, providing evidence of chromosome-specific mechanisms of evolution in insects in general. It is known that basal chironomid species have more diverse numbers of chromosomes (2n = 6-16) compared to other insects (22). Sys-tematic karyotypic studies of polytene chromosomes in chironomid midges have shown that a change in the basic chromosomal band sequence in arms A, B, C, D, E, F (representing Chromosomes 1-3) happens only when the basic chromosomal band sequence in arm G (Chromosome 4) also changes (58). Specifically, for chironomid midges, the fourth chromosome at the micromorphology level show the highest variation among populations exposed to different stresses (59). The high-order structure of the chromosomes may have facilitated this chromosome-level adaptations. Heterochromatin (HC) is prone to fuse with itself at low temperatures, leading to merging of chromosomes at chromocenters, tandem chromosomal merging, reciprocal exchange of chromosomal arms and the creation of new cytocomplexes. HC has a nonspecific reaction to environ-mental stresses, such as unusual salinity or temperature, which results in condensation-decondensation of HC at the nuclear membrane. Also, the presence of HC provides a high level of chromosomal reconstruction and a change in HC content in different species could have adaptive value (60).
We have yet to validate genome-wide heterochromatin structures; ongoing Hi-C analysis based on this chromosome level genome will be discussed elsewhere.
The analysis conducted in this study suggest that Chromosome 4 (in its 3-12 Mb region) may be under relaxed selective pressure, allowing the accumulation of heterozygotic mutations. This enabled the acquisition of novel genes as subfunctionalization after gene duplication is one route to the generation of genes with new functions. We find most of the ARIds to be in these regions, thus implicating the link between the variable Chromosome 4 and genomic adaptation against extreme stress. These observations suggest that the variable Chromosome 4 have promoted acquisition of novel gene sets that contributed to environmental adaptations. However, how adaptive variants that produced anhydrobiosis mechanisms became to be fixed on Chromosome 4 remains to be identified. Determining if the size and number of chromosomes correlates with higher diversity (61) would be another point that needs to be validated.
One element that may have caused the diversity in Chromosome 4 is DNA damage. We have previously observed extensive DNA damage during larvae anhydrobiosis and the induction of Rad51, an activator of double strand break repair during recovery (9). This implies the induction of homologous recombination repair (HR) or non-homologous end joining (NHEJ) during anhydrobiosis recovery. Our previous transcriptome analysis of Pv11 anhydrobiosis has revealed that NHEJ may be preferred in Pv11 cells (62), which if the same regulation is used in insects, would suggest some kind of regulation of HR. If this is the case, decrease of HR may have caused inhibition of GC-biased gene conversion, as it is linked with HR, thus suppressing the increase in GC ratio seen in other organisms. Furthermore, the biased substitution pattern indicated by the lower Ts/Tv values in Chromosome 4 may have been caused by the inhibition of GC-biased gene conversion. Lower Ts/Tv may also indicate oxidative stress caused during desiccation may have caused the oxidation of the guanosine nucleotide, causing G-to-T transversion, which were not correctly repaired. The relaxed selection pressure may have enhanced mutation accumulation caused by NHEJ repair or nucleotide-level DNA damage. Together, these mechanisms have driven the decrease in GC ratio and increase of nucleotide diversity in Chromosome 4.
Moreover, we have observed decrease in gene conservation in the Y sex chromosome of D. melanogaster, similar to what we observed in P. vanderplanki Chromosome 4. Sex chromosomes are known to have features similar to what we have observed in Chromosome 4, e.g. higher mutation rates, etc. (63). Environmental adaptation through genomic evolution on sex chromosomes in insects is rare, as we have found only one study showing thermal adaptation in the house fly Musca domestic linked with Y sex chromosome genes (64). The P. vanderplanki Chromosome 4 differs from the D. melanogaster sex chromosome as it contains much more genes and is not highly repetitive. Chironomids lack definitive sex chromosomes, queried through synteny analysis with Müller elements from Drosophila (65); however, sex determining regions (SDR) and sex biased regions have been identified by molecular and karyotyping studies. In C. riparius, reduced recombination rate has been observed in the Chromosome 3 SDR (66). In Polypedilum, a previous karyotyping analysis on P. nubifer has identified a sexbiased region on Chromosome 4 (67). This may suggest that the sex-biased region may be also conserved in P. vanderplanki Chromosome 4 and may be linked to the reduction in recombination. Furthermore, considering that the SDR is located on Chromosome 3 in C. riparius, establishment of Chromosome 4 as a sex chromosome may be recent. More studies based on this chromosome scale genome would provide a wide picture to the sex determination in this insect.
By comparison with an anhydrobiosis-incompetent relative, P. nubifer, we have previously shown that anhydrobiosis genes (i.e. those encoding LEA, LILs, thioredoxin proteins) have gone through massive duplication in the P. vanderplanki genome, forming a feature known as ARIds (11,12,16). Comparison with P. pembai suggested that the evolution of anhydrobiosis genes within the Polypedilum lineage may have occurred recently, at least after the divergence of P. vanderplanki and P. nubifer (63-100 MYA) (11,24,25,68). Most of these loci were located on Chromosome 4, emphasizing the importance of this chromosome for adaptation against desiccation.
Close examination of genomic loci on Chromosome 4 revealed two examples of genomic evolution on this chromosome. Chromosome 4 harbors majority of the Polypedilum or P. vanderplanki specific genes. From functional enrichment analysis, we identified a high copy number pseudogene family with weak homology with ionotropic receptors. Ionotropic receptor domains are thought to participate in chemical stimulus sensing (69). Chemosensory genes have been proposed to undergo subfunctionalization through duplication, facilitating the development of novel ecological traits (70). We observed relaxed selection in the orthologs, together with their expression profiles support these orthologs being pseudogenes. This data proposes that these genes may be remnants of gene duplication events that occurred after P. vanderplanki and P. pembai diverged approximately 33 MYA. It is possible that this protein family functioned as a desiccation sensor in the early stages of acquisition of the anhydrobiosis machinery, but eventually became unnecessary.
Similar to the accumulation of genus specific genes, Chromosome 4 harbors most of the ARId loci. In particular, ARId1 harbors the massively duplicated Lea and Lil genes. Previous studies have proposed that duplication of LEA protein genes allows their subfunctionalization to improve adaptation to desiccation (12,71). We have observed enrichment of HSEs in the upstream regions of Block 4 Lea paralogs and significantly higher pI values of Block 1 paralogs. These observations suggest the subfunctionalization in these paralogs; LEA proteins with pI values close to neutral are expressed when cells face osmotic stress, whereas paralogs with lower pI values are produced under other conditions. Intrinsically disordered proteins, in particular LEAs, have been implicated in various cellular mechanisms, including phase transition and protection of cellu-lar molecules in anhydrobiotes (72)(73)(74)(75)(76). A previous study using measles virus phosphoprotein showed that such proteins are less soluble at their isoelectric points (77). If this also is the case for P. vanderplanki LEA proteins, it may suggest that duplications of LEA protein genes have occurred to enable cellular protection when facing drastic changes in cellular pH, such as those that may occur during trehalose preconditioning and subsequent dehydration in Pv11 cells or desiccation in larvae. This subfunctionalization may have been facilitated by the relaxed selection taking place on Chromosome 4; the loci harboring multiple Pimt paralogs (ARId3) also shows similar signs (24). Moreover, the existence of HSEs in Block 4 Lea paralogs expands our knowledge on these ARId regions; we have previously identified HSF to be a major regulator of anhydrobiosis genes (78). Knock out experiments using the CRISPR/Cas9 system in Pv11 cells causes defects preventing cells from successfully entering anhydrobiosis (79). We anticipate examination of other ARId loci will present similar features linking currently identified transcription factors with effective proteins.
In conclusion, we present a chromosome-level genome assembly of the anhydrobiotic midge P. vanderplanki, and have observed for the first time, to our knowledge, the use of a single chromosome for trial-and-error genetic modifications leading to rapid environmental adaptation. The evolution of P. vanderplanki Chromosome 4 may have been fundamental to the rapid evolution of novel chironomid physiologies, i.e. tolerance to desiccation, by accumulation of novel genes. The relaxed selective pressure on Chromosome 4 has likely contributed to the rapidity of evolution. We provide two examples of such evolutionary experiments: subfunctionalization of the ARId1 Lea loci that underpin anhydrobiosis, and pseudogenization of a 117-copy ionotropic receptor gene family, possibly illustrating the evolutionary trajectory of anhydrobiosis in P. vanderplanki. Acquisition of such adaptive mechanisms may have enabled P. vanderplanki to inhabit environments where there is less competition. Together, this study demonstrates the genetic basis of desiccation tolerance within the Polypedilum genus and infers a similar mechanism for the adaptation to extreme environments in other chironomids. We await more chromosome-level chironomid genome sequences to verify the universality of additional chromosomes as a genetically plastic 'sandbox' in which to test evolutionary strategies for environmental adaptation in chironomids.

DATA AVAILABILITY
All sequencing data and the assembled genome obtained in this study has been uploaded to NCBI. A Jbrowsebased genome browser for the P. vanderplanki genome is hosted at MidgeBase2 (https://www.midgebase.org). Raw data and scripts used in this study were uploaded to a dedicated GitHub repository (https://github.com/Kikawada-Lab-UT-NARO/Pvanderplanki chromosomal genome).
The genome, DNA and CTR-Seq sequencing data and annotations have been uploaded to NCBI under the accession ID PRJNA660906. The transcriptome sequencing data have been uploaded to GEO under the accession ID GSE158443. This Whole Genome Shotgun project of P.
vanderplanki implemented with the Braker gene prediction (v5.2) has been deposited at DDBJ/ENA/GenBank under the accession JADBJN000000000. The version described in this paper is version JADBJN010000000.